Like fingerprints, cortical folding patterns are unique to each brain even though they follow a general species-specific organization. Some folding patterns have been linked with neurodevelopmental disorders. However, due to the high inter-individual variability, the identification of rare folding patterns that could become biomarkers remains a very complex task. This paper proposes a novel unsupervised deep learning approach to identify rare folding patterns and assess the degree of deviations that can be detected. To this end, we preprocess the brain MR images to focus the learning on the folding morphology and train a beta-VAE to model the inter-individual variability of the folding. We compare the detection power of the latent space and of the reconstruction errors, using synthetic benchmarks and one actual rare configuration related to the central sulcus. Finally, we assess the generalization of our method on a developmental anomaly located in another region. Our results suggest that this method enables encoding relevant folding characteristics that can be enlightened and better interpreted based on the generative power of the beta-VAE. The latent space and the reconstruction errors bring complementary information and enable the identification of rare patterns of different nature. This method generalizes well to a different region on another dataset. Code is available at https://github.com/neurospin-projects/2022_lguillon_rare_folding_detection.
translated by 谷歌翻译
Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall.
translated by 谷歌翻译
Classical graph algorithms work well for combinatorial problems that can be thoroughly formalized and abstracted. Once the algorithm is derived, it generalizes to instances of any size. However, developing an algorithm that handles complex structures and interactions in the real world can be challenging. Rather than specifying the algorithm, we can try to learn it from the graph-structured data. Graph Neural Networks (GNNs) are inherently capable of working on graph structures; however, they struggle to generalize well, and learning on larger instances is challenging. In order to scale, we focus on a recurrent architecture design that can learn simple graph problems end to end on smaller graphs and then extrapolate to larger instances. As our main contribution, we identify three essential techniques for recurrent GNNs to scale. By using (i) skip connections, (ii) state regularization, and (iii) edge convolutions, we can guide GNNs toward extrapolation. This allows us to train on small graphs and apply the same model to much larger graphs during inference. Moreover, we empirically validate the extrapolation capabilities of our GNNs on algorithmic datasets.
translated by 谷歌翻译
尽管存在许多减少卷积神经网络(CNN)过度拟合的方法,但仍不清楚如何自信地衡量过度拟合的程度。但是,反映过度拟合水平的度量可能非常有用,可对不同体系结构的比较和评估各种技术来应对过度拟合。由于过度拟合的神经网络倾向于记住训练数据中的噪声而不是普遍看不见的数据,因此我们研究了训练精度在增加数据扰动的存在并研究与过度拟合的联系时如何变化。尽管以前的工作仅针对标签噪声,但我们还是研究了一系列技术,以将噪声注入训练数据,包括对抗性扰动和输入损坏。基于此,我们定义了两个新的指标,可以自信地区分正确的模型和过度拟合模型。为了进行评估,我们得出了事先已知过度拟合行为的模型池。为了测试各种因素的效果,我们基于VGG和Resnet引入了架构中的几种反拟合措施,并研究其影响,包括正则化技术,训练集大小和参数数量。最后,我们通过测量模型池外几个CNN体系结构的过度拟合度来评估所提出的指标的适用性。
translated by 谷歌翻译